guided deformable attention
Recurrent Video Restoration Transformer with Guided Deformable Attention
Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT.
Recurrent Video Restoration Transformer with Guided Deformable Attention
Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT.
Supplementary Material for Recurrent Video Restoration Transformer with Guided Deformable Attention
In this supplementary material, we first give more details on training and testing datasets, as well as evaluation metrics. Then, we provide more visual comparisons of different methods. For video super-resolution, we train the model on two different training datasets for scale factor 4. First, we generate low-resolution images by the MATLAB imresize function (i.e., bicubic degradation) and train the model on REDS [8]. REDS4 [17] (i.e., clip 000, 011, 015, 020) is used as the test set. Second, we train the model on Vimeo-90K [18] with two different degradations: bicubic and blur downsampling (Gaussian blur with σ = 1.6 followed by subsampling).